Deep Learning Approaches to Automatic Beat Game Chart Generation

by Peter de Blanc + ChatGPT Deep Research
Posted to Adarie (www.adarie.com) on July 22, 2025
Content License: Creative Commons CC0 (No Rights Reserved)

Generating step charts (beatmaps) from audio using deep learning has seen active research and development across various rhythm games. Recent methods typically use neural networks to predict timing (when notes occur) and pattern selection (which actions or positions) for games like DDR/StepMania, Osu!, Beat Saber, and others. Crucially, many systems incorporate difficulty modeling – allowing generation of easier or harder charts – and have been evaluated in game-like settings for playability. Below is a structured survey of notable deep learning–based approaches, organized by game type, with their key features, targeted games, ML techniques, and notes on difficulty and output quality.

Dance Games: DDR & StepMania (4-Panel)

Dance Dance Convolution (2017) – One of the first deep learning models for Dance Dance Revolution (DDR) charts. It uses a two-stage neural network: a CNN+RNN (LSTM) to predict step timings from audio spectrograms, and a conditional LSTM to decide which arrow to press at each timing. The model is explicitly conditioned on chart difficulty so it can generate charts at different difficulty levels. The authors trained on thousands of human-made StepMania charts and even released a StepMania demo where players uploaded songs and selected a difficulty; the system would produce a playable chart in seconds. In preliminary user tests, generated charts were playable and reasonably enjoyable (average satisfaction ~3.87/5). Notably, DDC proved that neural networks can learn both the rhythmic structure and the choreography patterns of DDR, producing results comparable to human-made charts in terms of playability. It performed best on higher difficulties (where more training data was available) and can create multiple distinct charts per song, addressing the lack of a single “ground truth” chart. This work established a baseline for “learning to choreograph” in rhythm games.
Udo et al. (2020–2023, Hokkaido Univ.) – A series of works focusing on difficulty-controlled chart generation for DDR/StepMania. Their system first uses a deep neural network (trained on StepMania data) to detect musically salient instants (onsets) and produce an initial step chart, given an audio clip and a desired difficulty level. They then apply a refinement filter that removes or prunes steps to match the target density/difficulty. Essentially, the DNN predicts a high-density chart (often too difficult) and the filter uses a reference targets-per-measure (TPM) profile to sparsify the chart for lower difficulties. This approach ensures the number of arrows (note density) aligns with the chosen difficulty without simply scaling uniformly – it preserves rhythmically important notes while dropping others. Udo et al. reported that their difficulty-controlled charts better match intended difficulty levels (including generating easier charts with “sparse target density” that still follow the music). This research, culminating in a 2023 journal paper, demonstrates a near-production technique for automatically generating multi-level DDR charts, with an emphasis on accurate difficulty tuning.

Keyboard Rhythm Games: Osu!mania (4-Key)

Liang et al. (2019) – This work targets Osu!mania 4-key (a DDR-like mode in Osu!) and is an example of procedural content generation via deep learning for rhythm games. Liang and colleagues proposed a supervised pipeline with two models (similar to DDC’s concept): one model predicts timing of notes from audio, and a second model predicts the action type at each time (which column of the 4 keys, etc.). They introduced a “fuzzy label” technique to handle the ambiguity of multiple possible charts per song, and a novel C-BLSTM model (Convolutional Bidirectional LSTM) for sequence prediction. Importantly, they treat difficulty as an input feature to the model, allowing the network to learn from charts of varying levels and generate charts at a chosen difficulty setting. On a dataset of Osu!mania beatmaps, their method achieved an F-score of ~0.84 in timing prediction (improving over prior work) and produced charts that players rated as more natural compared to earlier approaches. In a small user study, human players found the AI-generated beatmaps felt closer to human-made ones than previous methods. This indicates that deep learning can successfully generate piano-key style rhythm game content with tunable difficulty.

VR Rhythm Games: Beat Saber

Beat Sage (2020) – A widely used AI web tool for automatically generating Beat Saber maps from any song. Beat Sage’s pipeline uses two neural networks trained on thousands of human-made Beat Saber maps. The first network analyzes the audio and predicts when to place blocks (note onsets), and the second network assigns a block type and direction to each timing (e.g. left-hand up-swing, right-hand down-swing, or simultaneous etc.). The system supports multiple difficulty levels – at launch it could generate maps in Normal, Hard, Expert, and Expert+ difficulties, mimicking the style of official Beat Saber tracks. In practice, users can select a difficulty and the AI produces a chart at that level. Beat Sage is considered close to production quality: reviewers noted it generates shockingly good results for many songs, often comparable in consistency and fun to community-made maps. The maps tend to match the beat and flow of the music well (especially for electronic or strongly rhythmic songs), though very complex or slow songs remain challenging. The developers (C. Donahue and A. Agarwal) continue to refine the model to avoid rare mapping issues (e.g. awkward patterns or vision blockers). Beat Sage demonstrates a practical, high-quality application of deep learning in a commercial-style rhythm game, with user-tunable difficulty and broad usage (it has been used to generate tens of thousands of custom Beat Saber levels).
DeepSaber (2020) – A research project (Master’s thesis) that built upon DDC to generate Beat Saber charts. DeepSaber implemented a multi-component deep learning approach to handle the high-dimensional choreography of VR rhythm games (which involve placement in 3D space and hand-specific notes). It introduced the idea that “beat maps are sentences; actions are words”, using NLP-inspired techniques. For example, the author created action embeddings (using Word2Vec/FastText) for Beat Saber blocks to encode their similarity/semantics. The model itself was a multi-stream LSTM architecture that ingests multiple inputs – audio features (MFCC), beat information, and even partial chart context (including difficulty as a feature) – to predict sequences of block placements. DeepSaber also proposed new evaluation metrics: a local metric based on the action embeddings (to see if generated patterns locally resemble known good patterns) and a global novelty/diversity metric to compare distribution of generated patterns to human ones. In comparisons, DeepSaber’s results were measured against other AI mappers (including an Oxford VR lab model and Beat Sage) on Expert-level songs. While primarily a research exploration, it showed that advanced deep learning (with embedding techniques and multi-input LSTMs) can generate reasonably playable Beat Saber maps. Some informal feedback noted DeepSaber produced more predictable patterns at lower difficulties, though other tools excelled more at higher complexity, suggesting potential to specialize models per difficulty. Overall, DeepSaber contributed novel ideas (pattern embeddings, multi-input networks) to improve AI choreography for VR rhythm games.

2D Rhythm Games: Osu! (Standard Mode)

Sypteras “AIsu” (2018) – An early open-source project by Nick Sypteras that generates Osu! standard mode beatmaps (clickable circles) from audio. This system uses deep learning to decide when to place a hit object, and some procedural logic for where to place it on the screen. Specifically, Sypteras trained a CNN-based model on hundreds of high-quality Osu! beatmaps, feeding in mel-scaled spectrogram windows to classify each time frame into one of four classes: no note, note in medium difficulty only, note in hard difficulty only, or notes in both. By training on song segments that had paired medium and hard charts, the model effectively learned to generate two difficulty levels simultaneously. The output “hit vectors” for medium and hard were then decoded into two separate beatmaps. To position the circles, AIsu did not use a fully learned approach; instead, it employed a Markov chain and heuristic rules to arrange notes in patterns, aiming to mimic the flow of human-designed patterns. (For example, it would choose note coordinates based on a learned distribution of angles and distances between consecutive notes.) The resulting maps were not perfect, but they were playable and demonstrated the feasibility of end-to-end generation of Osu! levels. The project’s web demo allowed users to input a song and get an Osu! beatmap with two difficulties. This work is notable for explicitly modeling difficulty (medium vs hard) and for being one of the first community-driven deep learning beatmap generators, paving the way for more advanced models.
BeatLearning (sedthh, 2024) – A recent open-source initiative that leverages state-of-the-art generative AI (transformers) to create beatmaps for various rhythm games, with initial focus on Osu! standard mode. The system converts beatmaps into a tokenized sequence format called BEaRT (each 100ms slice of time is a token encoding up to two note events). Paired with each token sequence are corresponding audio features (Mel spectrogram slices). BeatLearning uses a transformer-based model (inspired by BERT and GPT) that is trained to predict the next token given the past tokens and some “future” audio context (a masked-language-modelling style training with an encoder-decoder mix). Essentially, it learns to generate the sequence of hit objects conditioned on the music. This model is designed to be flexible: it can in principle support different game formats (1, 2, or 4 track games are mentioned) and different difficulty levels. In fact, the beta release includes a front-end where a user can upload a song and select a desired difficulty, and the AI will produce a beatmap at that level. Early examples (medium and hard Osu! maps) show promising results, though some manual cleanup may still be needed for polish. BeatLearning is still a work in progress, but it aims to be a general foundation model for rhythm game chart generation. Its use of modern transformer techniques and a tokenizer approach is pushing the field toward more scalable and potentially more generalizable chart generators that could handle multiple games and tunable difficulty. (Notably, the project draws inspiration from Sypteras’s earlier “AIosu” as well as OpenAI’s sequence modeling techniques.)

Drum Rhythm Games: Taiko no Tatsujin

TaikoNation (Halina & Guzdial, 2021) – A machine learning approach focused on generating charts for Taiko rhythm games (two-input drum hits) with an emphasis on human-like patterning. TaikoNation addresses a common issue in auto-generated charts: even if timing is correct, the pattern of notes (the alternating red/blue drum hits and rolls) often lacks the musicality and ergonomics of human charts. This project introduced a pattern-centric solution by using a multi-layer LSTM RNN to predict where to place notes, trained on Taiko game datasets. Unlike prior systems that might optimize primarily for hitting beats, TaikoNation’s LSTM was guided to form congruent note sequences that mirror musical motifs. In evaluation, it produced charts with significantly more “congruent, human-like patterning” than the earlier deep learning baseline (DDC) when measured on both Taiko and DDR datasets. In other words, the sequences of hits felt more coherent and intentional, much like a human mapper’s style. This approach did not explicitly focus on difficulty adjustment (the training data likely spanned various difficulties), but the resulting quality improvements in note patterns can indirectly help in making charts appropriate – e.g. avoiding awkward patterns at lower levels. TaikoNation demonstrates that beyond just timing prediction, modeling the interplay and repetition of notes (a form of musical pattern recognition) is crucial for high-quality chart generation. It stands as a high-quality academic example (presented at the Procedural Content Generation workshop) of improving the playability and authentic feel of AI-generated rhythm game content.

Production Systems in Industry

GenéLive (AAAI 2023) – A deep generative model developed by KLab Inc. for their commercial mobile rhythm games (notably Love Live! School Idol Festival All Stars, a popular multi-million player rhythm game). GenéLive is a production-grade system that was integrated into KLab’s level-design workflow. It builds upon the DDC approach but introduces two key innovations: a beat guide and a multi-scale convolutional stack. The beat guide explicitly incorporates musical beat and downbeat information as an input, ensuring the generated notes align well with the song’s rhythm structure. The multi-scale conv-stack captures musical features at multiple temporal resolutions (e.g. handling quick notes and long phrases). These additions helped the model recognize musical structure (like repeated choruses or varying intensities) better than previous models. GenéLive uses a two-module design (like others): an onset module to predict note timings and a “sym” module to predict note types (tap, flick, slide, etc., as required by the game). The system is conditioned on difficulty mode and was shown to improve generation quality especially on easier modes – a known weakness of earlier AI like DDC. In fact, KLab reported that GenéLive outperformed the original DDC baseline across all difficulties, and halved the time level designers spent on creating new charts. Rather than fully replacing human designers, the model generates initial draft charts that designers can then fine-tune (effectively an AI-assisted level design). The quality was high enough that in blind evaluations and live ops, the AI drafts were considered business-quality content. GenéLive’s code and trained model were open-sourced by the company, marking one of the first instances of a deep learning chart generator being deployed in a commercial production pipeline for a major rhythm game. This underscores that deep learning methods have matured to the point of real-world viability, producing charts that meet industry standards of playability and fun.

Comparison Summary

The table below summarizes the above projects, highlighting the game targets, AI approaches, difficulty handling, and notable results:

Project (Year)	Target Game(s)	Approach (Models)	Difficulty Support	Output Quality / Notes
Dance Dance Convolution (2017)	DDR (StepMania) 4-panel	CNN + LSTM for timing; conditional LSTM for steps (two-stage)	Yes – conditioned on difficulty input	Playable DDR charts; user demo with ~3.87/5 satisfaction. First deep learning DDR chart generator, baseline for later work.
Udo et al. (2020–2023)	DDR (StepMania) 4-panel	CNN/LSTM onset detector + rule-based refinement filter	Yes – generates then prunes to target density	Multi-level charts with accurate note density for each level. Used reference TPM (notes/min) to match intended difficulty.
Liang et al. (2019)	Osu!mania 4-key	BLSTM (C-BLSTM) sequence model, “fuzzy” labels for ambiguity	Yes – difficulty treated as input feature	Improved F-score (0.84) for timing; charts felt more natural than prior work. Focus on supervised PCG for 4-key mode.
Beat Sage (2020)	Beat Saber (VR)	2x Neural Nets (CNN/LSTM-style) – one for timing, one for block placement	Yes – supports Normal through Expert+ difficulties	Production-quality auto-mapper; generated maps often rival community maps in fun for suitable songs. Widely used via web.
DeepSaber (2020)	Beat Saber (VR)	Multi-input LSTM; action-embedding + MLSTM architecture	Yes – difficulty and other features included	Research project (thesis). Introduced action “word” embeddings and novel metrics. Showed feasibility of ML for complex VR patterns.
Sypteras “AIsu” (2018)	Osu! standard (click circle)	CNN classifier for hits + heuristic placement (Markov chain)	Yes – outputs two fixed difficulties (med & hard) simultaneously	First community DL mapper for Osu!. Web demo generated playable maps (required some manual tweaking). Established deep learning baseline for Osu!.
BeatLearning (2024)	Osu! standard (expanding to others)	Transformer-based sequence generative model (BERT/GPT hybrid)	Yes – user selects difficulty; model conditioned on it	Ongoing open-source project. Early results show promising beatmaps (without sliders yet). Aims to be general foundation model for rhythm games.
TaikoNation (2021)	Taiko (2-pad drum)	LSTM RNN focused on pattern sequence generation	Partial – focus on pattern quality, difficulty not main focus	Produced more human-like note patterns than prior ML approaches. Emphasized congruent patterns (key for playability in Taiko).
GenéLive (2023)	Love Live! & similar (mobile, multi-track)	CNN + RNN (onset & sym modules) with beat guide and multi-scale CNN enhancements	Yes – handles all in-game difficulty modes	Deployed in production (KLab). Halved chart design time, charts meet commercial quality. Open-sourced model used in a live game.

Sources: The information and outcomes above are drawn from academic papers, project reports, and developer discussions for each system (see citations). Each represents a milestone toward automating rhythm game content creation with deep learning, balancing musical alignment, difficulty, and fun.

Conclusion

From 2017’s pioneering Dance Dance Convolution to recent industry models like GenéLive! (2023), deep learning methods have rapidly advanced in generating beat game charts from audio. These systems commonly break the problem into predicting when notes should occur (like a musical onset task) and what actions or patterns to perform (a sequence generation task akin to language modeling). They often incorporate difficulty as a parameter – either by conditioning the model on difficulty or by post-processing the output – so that the generated charts can cater to various skill levels. Significantly, evaluations show that modern AI-generated charts can approach human-made quality: e.g. players were sometimes challenged to distinguish AI maps in terms of playability, and AI assistance is already speeding up professional chart design. While not every generated level is perfect, the trend is clear: leveraging CNNs, RNNs, and transformers (even diffusion models in latest attempts) has made automatic chart generation a practical reality. Ongoing work is improving musical structure awareness, pattern naturalness, and cross-game generalization, moving these tools ever closer to production-ready content creation across a variety of rhythm games. The convergence of academic research, open-source community projects, and commercial adoption suggests a bright future for AI-driven beatmap generation, where players can enjoy “infinite” new levels for their favorite songs at the click of a button.